也可以看我的notion讲解丰富一点 getchar源码分析

getchar源码分析

调用链

1
2
3
4
5
6
getchar()
_IO_getc_unlocked
__uflow
_IO_default_uflow
_IO_file_underflow (也叫做_IO_new_file_underflow)
_IO_SYSREAD (fp, fp->_IO_buf_base,fp->_IO_buf_end - fp->_IO_buf_base); //关键函数

流程分析

这里分析只getchar一般流程首先getchar是调用了_IO_getc_unlocked

1
2
3
4
5
6
7
8
9
10
11
int
getchar (void)
{
int result;
if (!_IO_need_lock (stdin))
return _IO_getc_unlocked (stdin);
_IO_acquire_lock (stdin);
result = _IO_getc_unlocked (stdin);
_IO_release_lock (stdin);
return result;
}

调用__uflow

_IO_getc_unlocked 然后宏定义如下调用了__uflow

1
2
3
4
#define _IO_getc_unlocked(_fp) __getc_unlocked_body (_fp)
#define __getc_unlocked_body(_fp) \
(__glibc_unlikely ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end) \
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)

调用__uflow 此时通过调试fd的情况如下指向的是*IO_2_1_stdin*

Untitled

_IO_2_1_stdin_ 此时情况如下

Untitled

通过观察发现常规调用getchar时此时fd不满足下面if条件的任何一个分支

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
int
__uflow (FILE *fp)
{
if (_IO_vtable_offset (fp) == 0 && _IO_fwide (fp, -1) != -1)
return EOF;

if (fp->_mode == 0)
_IO_fwide (fp, -1);
if (_IO_in_put_mode (fp))
if (_IO_switch_to_get_mode (fp) == EOF)
return EOF;
if (fp->_IO_read_ptr < fp->_IO_read_end)
return *(unsigned char *) fp->_IO_read_ptr++;
if (_IO_in_backup (fp))
{
_IO_switch_to_main_get_area (fp);
if (fp->_IO_read_ptr < fp->_IO_read_end)
return *(unsigned char *) fp->_IO_read_ptr++;
}
if (_IO_have_markers (fp))
{
if (save_for_backup (fp, fp->_IO_read_end))
return EOF;
}
else if (_IO_have_backup (fp))
_IO_free_backup_area (fp);
return _IO_UFLOW (fp);
}

调用_IO_default_uflow

然后程序就在__uflow函数中调用_IO_UFLOW,它是一个虚表函数,对应调用函数是_IO_default_uflow(具体的寻找方法下面的_IO_new_file_underflow会进行讲解,这里省略)

1
2
3
4
5
6
7
8
int
_IO_default_uflow (FILE *fp)
{
int ch = _IO_UNDERFLOW (fp);
if (ch == EOF)
return EOF;
return *(unsigned char *) fp->_IO_read_ptr++;
}

然后在该函数中调用_IO_UNDERFLOW ,该函数是一个虚表函数,对应JUMP0 (__underflow, FP)

1
2
#define _IO_UNDERFLOW(FP) JUMP0 (__underflow, FP)
#define JUMP0(FUNC, THIS) (_IO_JUMPS_FUNC(THIS)->FUNC) (THIS)

而此时fp指向的*IO_2_1_stdin* 他的vtable表指向_IO_file_jumps

Untitled

_IO_file_jumps如下,而其中的__underflow 正好是对应_IO_new_file_underflow

Untitled

调用_IO_new_file_underflow —重点函数

源代码如下

该函数的详细源码分析请看 fread源码实现 讲的很清晰。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
int
_IO_new_file_underflow (FILE *fp)
{
ssize_t count;

/* C99 requires EOF to be "sticky". */
if (fp->_flags & _IO_EOF_SEEN)
return EOF;

if (fp->_flags & _IO_NO_READS)
{
fp->_flags |= _IO_ERR_SEEN;
__set_errno (EBADF);
return EOF;
}
if (fp->_IO_read_ptr < fp->_IO_read_end)
return *(unsigned char *) fp->_IO_read_ptr;

if (fp->_IO_buf_base == NULL)
{
/* Maybe we already have a push back pointer. */
if (fp->_IO_save_base != NULL)
{
free (fp->_IO_save_base);
fp->_flags &= ~_IO_IN_BACKUP;
}
_IO_doallocbuf (fp);
}

/* FIXME This can/should be moved to genops ?? */
if (fp->_flags & (_IO_LINE_BUF|_IO_UNBUFFERED))
{
/* We used to flush all line-buffered stream. This really isn't
required by any standard. My recollection is that
traditional Unix systems did this for stdout. stderr better
not be line buffered. So we do just that here
explicitly. --drepper */
_IO_acquire_lock (stdout);

if ((stdout->_flags & (_IO_LINKED | _IO_NO_WRITES | _IO_LINE_BUF))
== (_IO_LINKED | _IO_LINE_BUF))
_IO_OVERFLOW (stdout, EOF);

_IO_release_lock (stdout);
}

_IO_switch_to_get_mode (fp);

/* This is very tricky. We have to adjust those
pointers before we call _IO_SYSREAD () since
we may longjump () out while waiting for
input. Those pointers may be screwed up. H.J. */
fp->_IO_read_base = fp->_IO_read_ptr = fp->_IO_buf_base;
fp->_IO_read_end = fp->_IO_buf_base;
fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_write_end
= fp->_IO_buf_base;

count = _IO_SYSREAD (fp, fp->_IO_buf_base,
fp->_IO_buf_end - fp->_IO_buf_base);
if (count <= 0)
{
if (count == 0)
fp->_flags |= _IO_EOF_SEEN;
else
fp->_flags |= _IO_ERR_SEEN, count = 0;
}
fp->_IO_read_end += count;
if (count == 0)
{
/* If a stream is read to EOF, the calling application may switch active
handles. As a result, our offset cache would no longer be valid, so
unset it. */
fp->_offset = _IO_pos_BAD;
return EOF;
}
if (fp->_offset != _IO_pos_BAD)
_IO_pos_adjust (fp->_offset, count);
return *(unsigned char *) fp->_IO_read_ptr;
}

然后关键点就在于这一个代码,此时就是去调用的 _IO_SYSREAD 而该函数的本质就是调用系统调用read函数。

1
2
count = _IO_SYSREAD (fp, fp->_IO_buf_base,
fp->_IO_buf_end - fp->_IO_buf_base);

综上

其实getchar本质调用的是下面函数

1
read(0,fp->_IO_buf_base,fp->_IO_buf_end - fp->_IO_buf_base)

那有人可能要问了,getchar不是一次只输入一个字符,这里不输入的长度很大了嘛?

其实实现只读一个字符的地方位于_IO_default_uflow函数

1
2
3
4
5
6
7
8
int
_IO_default_uflow (FILE *fp)
{
int ch = _IO_UNDERFLOW (fp);
if (ch == EOF)
return EOF;
return *(unsigned char *) fp->_IO_read_ptr++;
}

该函数调用完_IO_new_file_underflow中的_IO_SYSREAD之后只让_IO_read_ptr进行了加一,而_IO_read_base_IO_read_ptr 之间才是我们真正向缓冲区的有效读入个数,此时就意味着读入一个字符。

我们getchar读入之前

Untitled

getchar读入之后(我们输入了abc)

Untitled

可以看到程序蒋abc都读入到0x5555555592a0地址,并且让 _IO_read_base指向它,而_IO_read_ptr 指向它的后一位,此时

_IO_read_base_IO_read_ptr之间的a才是有效读入(也就是getchar读入一字节的实现方式)